Task 1

a)

In this task, I am given an accelerometer data for different gestures. It wouldn’t make much sense to visualize the acceleration information, so with below code, I calculated the displacements in each axes. The 3d plots are interactive, it can be moved around with the mouse

You can see from the graphs that the graphs looks similar to the gestures they represent.

b)

I tried two distance measures for KNN algorithm, manhattan distance and euclidean distance. I wrote a KNN function called “knnfunc”. Its inputs are the training data, the test data, k level and the distance measure.

I combined the coordiate information into a single matrix called “train”. Then with the below code, I applied 10-fold cross validation for both my distance measures. I have used 1 repetition, since these calculations are time consuming. You can modify the number of replications by changing the “nofReplications”. I also normalize the data since there may be differing accelerations.

train<-cbind(trainx,trainy[,-1],trainz[,-1])
train<-cbind(train[,1],scale(train[,-1]))
##     Klev      Accu
##  1:    1 0.9564732
##  2:    2 0.9564732
##  3:    3 0.9587054
##  4:    4 0.9575893
##  5:    5 0.9508929
##  6:    6 0.9531250
##  7:    7 0.9453125
##  8:    8 0.9497768
##  9:    9 0.9419643
## 10:   10 0.9430804
##     Klev      Accu
##  1:    1 0.9441964
##  2:    2 0.9441964
##  3:    3 0.9441964
##  4:    4 0.9497768
##  5:    5 0.9408482
##  6:    6 0.9430804
##  7:    7 0.9375000
##  8:    8 0.9386161
##  9:    9 0.9341518
## 10:   10 0.9308036

The accuracy tables show that the maximizing k-level for manhattan is 3 and for euclidean is 4. Now, I can utilize this information for the test data.

c)

Before moving on, I normalized the test data. Then, I applied 3 level kkn with manhattan distance and 4 level knn with euclidean distance.

The accuracy, run time and confusion matrices can be found below.

## [1] 0.9508654
##    
##       1   2   3   4   5   6   7   8
##   1 432   0   0   2   0   3   0   0
##   2   1 451   0   0   0   0   0   0
##   3   2   0 415   0  12  20   5   0
##   4   3   0   0 384  48   8   0   7
##   5   3   0   4   2 422   2   0   0
##   6   6   0   4  12  27 400   0   0
##   7   1   0   2   0   0   0 444   0
##   8   0   0   0   1   1   0   0 458
##    user  system elapsed 
##  240.89    0.51  243.32
## [1] 0.945282
##    
##       1   2   3   4   5   6   7   8
##   1 431   0   0   2   0   4   0   0
##   2   1 449   0   0   0   0   2   0
##   3   1   0 416   0  15  16   6   0
##   4   5   0   0 372  60   7   0   6
##   5   3   0   7   2 419   2   0   0
##   6   3   0   3  15  29 398   1   0
##   7   0   0   3   0   0   0 444   0
##   8   0   0   0   2   1   0   0 457
##    user  system elapsed 
##  236.62    0.16  238.02

The accuracy for the manhattan distance is 95% and for the euclidean distance 94% which are quite high. The confusion matrices indicate a problem with classifying gesture 4 as gesture 5. Runtimes for the algorithms are around 4 minutes. This can be reduced as there are efficient KNN packages with euclidean distance. However, I couldn’t find one with a different distance measure so I wrote my own algorithm.

Task2

a)

This time we are given ECG readings. I used the “penalized” package. I found the optimum L1 and L2 buy using a 10-fold cross validation. The package has functions called optL1 and optL2 that can be utilized for this purpose. I didn’t feel the need to scale the data, since these are all ECG readings from a human. There wouldn’t be much difference. I also assigned “0” to “-1” classified readings. Also, I used 0.5 as the decision threshold.

## [1] 0.82
##    
##      0  1
##   0 25 11
##   1  7 57

With this model, we have an accuracy of 82%. I assume getting a “1” as a positive. We have a high false positive number and Less false negative. And the test data have an overall high number of positives than negatives. Maybe, the cross validation needs to be more stratified.

b)

I draw a plot with one time series and the model coefficients.

Based on the plot, the coefficents seems to correspond to the times where changes happen. They seem to indicate the direction of the change. The graph consists of a 0 class, a 1 class and the coefficients. The coefficients seem to capture the change in both classes.

c)

Now, based on the information in b, I calculated the difference between the consecutive time series observations and I created a model.

## [1] 0.85
##    
##      0  1
##   0 27  9
##   1  6 58

With this model, our accuracy has increased and our false positive and negative number has fallen.

d)

The difference is changing more in class 1. Again, the coefficients try to capture the change. This time, they capture the movements better. The fused lasso gave us a smooth coefficients (not changing rapidly) and ridge eliminated the less usefull ones. The coefficients do capture the big changes in the data.

I will share the codes on monday.